Back

Nature Biotechnology

Springer Science and Business Media LLC

All preprints, ranked by how well they match Nature Biotechnology's content profile, based on 147 papers previously published here. The average preprint has a 0.35% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
ACCIO: An Assembly-Based Tool Enabling Plasmid Detection

Raabe, N. J.; Griffith, M. P.; Rangachar Srinivasa, V.; Waggle, K. D.; Sundermann, A. J.; Pless, L.; Snyder, G. M.; Brooks, M. M.; Van Tyne, D.; Harrison, L. H.

2025-11-02 infectious diseases 10.1101/2025.10.30.25338662 medRxiv
Top 0.1%
50.9%
Show abstract

2.Plasmids are extrachromosomal mobile genetic elements that often carry genes responsible for antimicrobial resistance. Plasmid epidemiology aims to track the evolution and spread of plasmids, but the field currently faces significant barriers that make practical implementation using whole genome sequence data difficult. Hybrid-assembled genomes remain the most reliable way to identify and track complete plasmids; however, most genomic surveillance data exists in the form of short-read sequencing, which lacks the resolution required to accurately resolve plasmids. Despite recent advances, long-read-only assemblies have not yet reached the consistency seen in hybrid assemblies. The ideal approach to plasmid epidemiology using whole genome sequence data would consider the limitations of sequencing technologies and the constraints of existing genomic surveillance infrastructure, in addition to the unique evolutionary biology of plasmids. Here, we present ACCIO (Assembly-based Circular Contig Identification for Outbreaks), a tool which creates a reference plasmid database and uses it to infer which plasmids, and genetically related plasmid groupings, are present in an input assembly (Illumina, Nanopore, or hybrid assembly). We validated ACCIO using an internal dataset of 303 plasmid-harboring bacterial clinical and surveillance isolates collected from a single acute tertiary care center. When highly related database plasmids were grouped together, ACCIO achieved 100% sensitivity and 92.1% positive predictive value (PPV) for detection of plasmid groups using hybrid assemblies, and comparably strong performance for Illumina (93.0% sensitivity, 86.6% PPV) and Nanopore (79.3% sensitivity, 91.4% PPV) assemblies. Evaluation on three external datasets yielded consistently high performance. Finally, when benchmarked against MOB-suite, a tool for reconstruction and typing of plasmids, ACCIO demonstrated superior performance across nearly all assembly types and plasmid grouping levels. By integrating database construction, clustering, and plasmid calling into a single workflow compatible with all major sequencing platforms, ACCIO is intended to help advance plasmid epidemiology beyond its current technological and infrastructural barriers. 3. Impact statementDetecting and tracking plasmids--the mobile genetic elements often responsible for spreading antimicrobial resistance in hospital settings--is challenging, particularly when relying on short-read sequencing data alone. Short-read genome assemblies, despite widespread use in surveillance of bacterial pathogens, inherently lack the resolution required for plasmid analyses. Current bioinformatic methods struggle to identify whole plasmids from short-read assemblies alone, and often, hybrid assembly using both short- and long-read data is required for the robust analyses that are essential for tracking plasmids. To address these challenges, we developed ACCIO, a bioinformatics tool which utilizes input genome assemblies (short-read, long-read, or hybrid assemblies) to assess the plasmid content of clinical bacterial isolates for epidemiologic purposes. We validated its use against the recovery of circular plasmid sequences from hybrid assembled genomes as a gold standard method for determining plasmid content. Using a curated local database of 430 plasmid sequences, ACCIO provided accurate inferences of plasmid content from short-read (Illumina), long-read (Oxford Nanopore Technologies), and hybrid assemblies (both), ultimately facilitating genomic surveillance of plasmids regardless of sequencing technology. This work represents a meaningful step forward in advancing plasmid surveillance beyond the technological and infrastructural barriers that limit its broader expansion into healthcare and other settings. 4. Data summaryShort- and long-read sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under multiple BioProjects, and corresponding hybrid genome assemblies are available in GenBank. Accession numbers for all BioProjects, BioSamples, and SRA datasets are provided in Supplementary Data S1. All supporting data, software code, and experimental/analysis protocols are provided within the article or in supplementary data files. External validation of ACCIO used three external datasets (Cho et al. 2023, BioProjects PRJNA475751 and PRJNA874473, DOI: 10.1038/s41598-024-70540-1; Lipworth et al. 2024, BioProject: PRJNA604975, DOI: 10.1038/s41467-024-45761-7; Khezri et al. 2021, European Nucleotide Archive (ENA): PRJEB45084, DOI: 10.3390/microorganisms9122560). List of External SoftwareO_LIMOB-suite (v3.1.9) - https://github.com/phac-nml/mob-suite C_LIO_LISkani (v0.2.2) - https://github.com/bluenote-1577/skani C_LIO_LIScipy (v1.16.1) - https://github.com/scipy/scipy C_LIO_LIPling (v2.0.0) - https://github.com/iqbal-lab-org/pling C_LIO_LIMUMmer / NUCmer (v4.0.1) - https://mummer4.github.io/ C_LIO_LIMash / Mash Screen (v2.3) - https://github.com/marbl/Mash C_LIO_LISPAdes (v3.15.5) - https://github.com/ablab/spades C_LIO_LIUnicycler (v0.5.1) - https://github.com/rrwick/Unicycler C_LIO_LIFlye (v2.9.5) - https://github.com/mikolmogorov/Flye C_LIO_LIQUAST (v5.2.0) - https://github.com/ablab/quast C_LIO_LIKraken2 (v2.1.3) - https://github.com/DerrickWood/kraken2 C_LIO_LICheckM (v0.4) - https://github.com/Ecogenomics/CheckM C_LIO_LIAlbacore/Guppy - [no longer officially hosted; was distributed by ONT] C_LIO_LIGuppy - https://nanoporetech.com/software/other/guppy C_LIO_LIDorado - https://github.com/nanoporetech/dorado C_LIO_LIBowtie2 (v2.5.4) - https://github.com/BenLangmead/bowtie2 C_LIO_LIMinimap2 (v2.28) - https://github.com/lh3/minimap2 C_LIO_LIBiopython (v1.85) - https://biopython.org/ C_LIO_LIPandas (v2.3.1) - https://pandas.pydata.org/ C_LIO_LIPlasme (v1.1) - https://github.com/HubertTang/PLASMe C_LIO_LIBLAST(v2.17.0) - https://blast.ncbi.nlm.nih.gov/Blast.cgi C_LI

2
Unbiased and UMI-informed sequencing of cell-free miRNAs at single-nucleotide resolution

van Eijndhoven, M. A. J.; Aparicio-Puerta, E.; Gomez-Martin, C.; Medina, J. M.; Drees, E. E. E.; Bradley, E. J.; Bosch, L.; Scheepbouwer, C.; Hackenberg, M.; Pegtel, D. M.

2021-05-04 genomics 10.1101/2021.05.04.442244 medRxiv
Top 0.1%
50.7%
Show abstract

Terminal nucleotidyl transferases are enzymes that add non-templated nucleotides to RNA molecules. In the case of microRNAs, this process was shown to be functionally relevant for their maturation process and generation of isomiRs with non-canonical mRNA targets. Deconvolution of these posttranscriptional modifications is challenging in particular for extracellular miRNAs that are considered as a target for minimally-invasive diagnostics. Massively parallel RNA sequencing is the only method that can truthfully reveal isomiR diversity in biological samples and determine relative quantities. Improvements aside, current small RNA sequencing strategies remain imprecise. We developed IsoSeek that diverges from these methods by making use of randomized 5- and 3-adapters combined with a 10N unique molecular identifier (UMI). Using synthetic miRNA and isomiR spike-in sets and testing depletion and RNA competition strategies in 7 sequencing rounds of >100 samples, we rigorously optimized and validated the technical accuracy of the IsoSeek method. In genetically-altered HEK293, we characterized the terminal uridylase (TUT4/TUT7) dependent miRNA uridylome and discovered extensive uridylation of disease-associated miRNAs. Notably, 3-uridylated isomiR profiles of plasma extracellular vesicles (EVs) rely on UMI-correction. Thus, IsoSeek advances our knowledge of cell-free miRNAs and supports development into non-invasive biomarkers.

3
High Resolution Spatial Mapping of Microbiome-Host Interactions via in situ Polyadenylation and Spatial RNA Sequencing

Ntekas, I.; Takayasu, L.; McKellar, D. W.; Grodner, B. M.; Holdener, C.; Schweitzer, P. A.; Sauthoff, M.; Shi, Q.; Brito, I. L.; De Vlaminck, I.

2024-11-18 genomics 10.1101/2024.11.18.624127 medRxiv
Top 0.1%
50.1%
Show abstract

Inter-microbial and host-microbial interactions are thought to be critical for the functioning of the gut microbiome, but few tools are available to measure these interactions. Here, we report a method for unbiased spatial sampling of microbiome-host interactions in the gut at one micron resolution. This method combines enzymatic in situ polyadenylation of both bacterial and host RNA with spatial RNA-sequencing. Application of this method in a mouse model of intestinal neoplasia revealed the biogeography of the mouse gut microbiome as function of location in the intestine, frequent strong inter-microbial interactions at short length scales, shaping of local microbiome niches by the host, and tumor-associated changes in the architecture of the host-microbiome interface. This method is compatible with broadly available commercial platforms for spatial RNA-sequencing, and can therefore be readily adopted to broadly study the role of short-range, bidirectional host-microbe interactions in microbiome health and disease.

4
Multi-step genomics on single cells and live cultures in sub-nanoliter capsules

Mazelis, I.; Sun, H.; Kulkarni, A.; Torre, T.; Klein, A. M.

2025-03-17 genomics 10.1101/2025.03.14.642839 medRxiv
Top 0.1%
48.1%
Show abstract

Single-cell sequencing methods uncover natural and induced variation between cells. Many functional genomic methods, however, require multiple steps that cannot yet be scaled to high throughput, including assays on living cells. Here we develop capsules with amphiphilic gel envelopes (CAGEs), which selectively retain cells and large analytes while being freely accessible to media, enzymes and reagents. Capsules enable high-throughput multi-step assays combining live-cell culture with genome-wide readouts. We establish methods for barcoding CAGE DNA libraries, and apply them to measure persistence of gene expression programs in cells by capturing the transcriptomes of tens of thousands of expanding clones in CAGEs. The compatibility of CAGEs with diverse enzymatic reactions will facilitate the expansion of the current repertoire of single-cell, high-throughput measurements and extension to live-cell assays.

5
Optics-free reconstruction of 2D images via DNA barcode proximity graphs

Liao, H.; Kottapalli, S.; Huang, Y.; Chaw, M.; Gehring, J.; Waltner, O.; Phung-Rojas, M.; Daza, R. M.; Matsen, F. A.; Trapnell, C.; Shendure, J.; Srivatsan, S. R.

2024-08-08 genomics 10.1101/2024.08.06.606834 medRxiv
Top 0.1%
44.4%
Show abstract

Spatial genomics technologies include imaging- and sequencing-based methods. Sequencing-based spatial methods typically require surfaces coated with coordinate-associated DNA barcodes, but the physical registration of these barcodes to spatial coordinates is challenging, necessitating either high density printing of oligonucleotides or in situ sequencing/probing of randomly deposited, DNA-barcode-bearing beads. As a consequence, the surface areas available to sequencing-based spatial genomic methods are constrained by the time, labor, cost and instrumentation required to either print or decode a coordinate-tagged surface. To address this challenge, we developed SCOPE (Spatial reConstruction via Oligonucleotide Proximity Encoding), an optics-free, DNA microscopy-inspired method. With SCOPE, the relative positions of DNA-barcoded beads within a 2D shape, 2D image or 3D volume are inferred from the ex situ sequencing of chimeric molecules formed from diffusing "sender" and tethered "receiver" oligonucleotides. To demonstrate the potential of this approach, we applied SCOPE to reconstruct 2D shapes, 2D images or 3D volumes defined by 104-106 x 20-100 {micro}m DNA barcoded beads, including an asymmetric "swoosh" resembling the Nike logo (44 mm2), a "color" Snellen eye chart (704 mm2) and the surface topology of 3D molds of a teddy bear, star, butterfly or block letter (75-100 mm3). Each of the resulting "DNA barcode proximity graphs" was computationally reconstructed in an automated fashion, across fields of view and at resolutions that were determined by sequencing depth, bead size and diffusion kinetics, rather than by microarray or microscope instrument time. Because the ground truth shapes are known, these datasets may be particularly useful for the further development of computational algorithms by this nascent field.

6
DAMPA - accelerated and simplified design of probe panels for targeted metagenomics using pangenome graphs

Payne, M.; Tam, K. K.-G.; Rockett, R. J.; Basile, K.; Bowden, R.; Sintchenko, V.; Kok, J.; Golubchik, T.

2026-05-22 infectious diseases 10.64898/2026.05.15.26352859 medRxiv
Top 0.1%
42.5%
Show abstract

Targeted metagenomics, where samples are enriched for multiple organisms of interest using oligonucleotide probes, is a highly efficient sequencing methodology that is becoming standard practice for genomics of viruses and complex polymicrobial samples. Efficient enrichment critically requires probes that capture both conserved and highly diverse genomic regions without loss of sensitivity, and with uniform representation in the sequencing pool. Design of optimal probesets poses a challenge: existing computational methods use k-mer hashing to reduce over-abundant sequences, but scalability and efficiency drop with increasing numbers of genomes, while diverse sequences remain under-represented. Here we show that incorporating evolutionary distance to compress probes via a graph-based representation of multiple genomes across species, together with k-mer hashing, reduces overrepresentation of conserved sequences, and yields more uniform coverage even of highly diverse loci. We make the method available in Dampa, an open-source tool that generates probesets in seconds on a standard laptop.

7
Capsule-Based Single-Cell Genome Sequencing

Mullaney, D. B.; Sgrizzi, S. R.; Mai, D.; Campbell, I.; Huang, Y.; Sinkunas, A.; Kerr, D. L.; Browning, V. E.; Eisenach, H. E.; Sims, J. N.; Nichols, E. K.; Lapointe, C. P.; Amimura, Y.; Harris, K.; Zilionis, R.; Srivatsan, S. R.

2025-03-17 genomics 10.1101/2025.03.14.643253 medRxiv
Top 0.1%
42.1%
Show abstract

Single-cell genomics methods have unveiled the heterogeneity present in seemingly homogenous populations of cells, however, these techniques require meticulous optimization. How exactly does one handle and manipulate the biological contents from a single cell? Here, we introduce and characterize a novel semi-permeable capsule (SPC), capable of isolating single cells and their contents while facilitating biomolecular exchange based on size-selectivity. These capsules maintain stability under diverse physical and chemical conditions and allow selective diffusion of biomolecules, effectively retaining larger biomolecules including genomic DNA, and cellular complexes, while permitting the exchange of smaller molecules, including primers and enzymes. We demonstrate the utility of SPCs for single cell assays by performing the simultaneous culture of over 500,000 cellular colonies, demonstrating efficient and unbiased nucleic acid amplification, and performing combinatorial indexing-based single-cell whole genome sequencing (sc-WGS). Notably, SPC-based sc-WGS facilitates uniform genome coverage and minimal cross-contamination allowing for the detection of genomic variants with high sensitivity and specificity. Leveraging these properties, we conducted a proof-of-concept lineage tracing experiment using cells harboring the hypermutator polymerase {varepsilon} allele (POLE P286R). Sequencing of 1000 single cell genomes at low depth facilitated the capture of lineage marks deposited throughout the genome during each cell division and the subsequent reconstruction of cellular genealogies. Capsule-based sc-WGS expands the single-cell genomics toolkit and will facilitate the investigation of somatic variants, resolved to single cells at scale.

8
Carrierwave: A granular, incentive-aligned infrastructure for scientific communication

Bachelet, I.

2026-03-03 scientific communication and education 10.64898/2026.03.01.708795 medRxiv
Top 0.1%
41.9%
Show abstract

The peer-reviewed journal article imposes structural constraints on the dissemination, validation, and reuse of research outputs. Intermediate results, negative findings, methodological refinements, and replication attempts are systematically underrepresented in published literature, limiting visibility into ongoing research activity for both scientists and mission-driven funders. Here we present Carrierwave, an open infrastructure for continuous, granular scientific communication built on structured research objects (ROs), cryptographic provenance, blockchain-based attribution, and programmable incentive mechanisms. Each RO represents an atomic unit of scientific output -- a single experimental result, negative finding, dataset, protocol, or replication -- that is hashed for content integrity, stored in a persistent database, and optionally minted as an ERC-721 non-fungible token on the Ethereum blockchain. The system includes an on-chain bounty pool enabling funders to directly incentivize specific research activities, and an automated analysis layer that synthesizes disclosed ROs into continuously updated research landscape maps. We describe the system architecture, report on its implementation and deployment on Ethereum mainnet, and present a quantitative analysis of disease-specific publication frequency demonstrating the information latency problem that Carrierwave addresses. The distribution of publication frequency across disease areas is highly skewed, with the majority of conditions represented by fewer than four publications per year in high-impact biology journals. For diseases in the long tail, the interval between successive publications may span months or years. Publication frequency correlates poorly with disease burden, instead reflecting historical research community size and advocacy momentum. By reducing the unit of communication to the individual research object and eliminating editorial gatekeeping as a prerequisite for disclosure, Carrierwave increases the effective sampling rate of scientific activity in precisely the domains where publication-based visibility is most sparse. The system is live at https://carrierwave.org.

9
Catch and release of sialoglycoRNAs enables sequencing-based profiling across cellular and extracellular material

Flynn, R. A.; Ge, R.; Rai, S. K.; Coffey, R. J. A.; Jeppesen, D. K.; Zhang, Q.; Higginbotham, J. N.

2025-10-05 molecular biology 10.1101/2025.10.04.680438 medRxiv
Top 0.1%
40.9%
Show abstract

Glycosylated RNAs (glycoRNAs) represent a recently discovered class of small RNAs, but their systematic characterization has been limited by reliance on metabolic chemical reporters and high RNA input requirements. Here we present rPAL sequencing (rPAL-seq), a sensitive and selective platform for de novo discovery of sialoglycoRNAs. rPAL-seq combines enhanced periodate oxidation of sialic acids with a capture-release workflow and optimized library construction using poly(A) extension coupled with template-switching reverse transcription. The method enabled reproducible profiling from less than 100 ng of input RNA, corresponding to less than 2% of the material required by previous approaches. When applied across 13 human cell lines, rPAL-seq identified lineage-associated glycoRNA patterns alongside a conserved core dominated by uridine-rich snRNAs and snoRNAs, with modification signatures implicating glycosylation on acp3U or related uridine-based modifications. Extending to extracellular vesicles and non-vesicular nanoparticles, rPAL-seq revealed secreted glycoRNA profiles distinct from those of the cellular fraction. rPAL-seq provides a robust, scalable strategy for glycoRNA profiling, opening new avenues for studying this emerging biopolymer.

10
Dibenzocyclooctyne-modified PCR primers enable direct enzyme-free click chemistry ligation for custom nanopore amplicon sequencing

Lypaczewski, P.; Shapiro, B. J.

2026-04-21 genomics 10.64898/2026.04.18.719403 medRxiv
Top 0.1%
40.7%
Show abstract

Oxford Nanopore Technologies (ONT) rapid library preparation kits use transposase-mediated tagmentation to attach click chemistry functionalized oligonucleotide duplexes to fragmented DNA, followed by click chemistry to conjugate Rapid Adapter (RA) sequencing adapters. A similar protocol is used in 16S rRNA gene amplicon and PCR-amplified rapid whole-genome sequencing workflows. Here, we describe custom oligonucleotides with dibenzocyclooctyne (DBCO) added onto PCR primer 5' termini. After standard PCR amplification, DBCO-modified amplicons react spontaneously with RA sequencing adapters, producing sequencing-ready libraries in minutes without enzymatic processing. All configurations employ an asymmetric design in which the DBCO modification is restricted to a single primer, leaving the opposite primer available for barcoding at low cost. We validate three primer architectures: (i) direct attachment of DBCO to a target-specific primer, (ii) a universal DBCO-modified oligonucleotide used in a two-step PCR workflow, and (iii) a three-primer single-pot reaction combining the universal DBCO oligonucleotide with unmodified target-specific primers. These configurations are validated using full-length 16S rRNA gene amplicons sequenced on a PromethION flow cell. DBCO-modified primers are synthesized either commercially or in-house via DBCO-TFP ester conjugation to 5'-amino oligonucleotides and remain fully active through standard PCR thermocycling. The best-performing configuration used a two-step PCR with a universal oligonucleotide and achieved higher pore occupation and reads than comparable commercial solutions. This approach reduces library preparation reagent costs compared to available kits, as the initial synthesis cost is lower than existing amplicon sequencing kits, while providing enough material for hundreds or thousands of PCR reactions. This is further applicable to an unlimited number of gene targets beyond 16S sequencing.

11
CarpeDeam: A De Novo Metagenome Assemblerfor Heavily Damaged Ancient Datasets

Kraft, L.; Soeding, J.; Steinegger, M.; Jochheim, A.; Fernandez-Guerra, A.; Renaud, G.

2024-08-09 genomics 10.1101/2024.08.09.607291 medRxiv
Top 0.1%
40.6%
Show abstract

De novo assembly of ancient metagenomic datasets is a challenging task. Ultra-short fragment size and characteristic postmortem damage patterns of sequenced ancient DNA molecules leave current tools ill-equipped for ideal assembly. We present CarpeDeam, a novel damage-aware de novo assembler designed specifically for ancient metagenomic samples. Utilizing maximum-likelihood frameworks that integrate sample-specific damage patterns, CarpeDeam demonstrates improved recovery of longer continuous sequences and protein sequences in many simulated and empirical datasets compared to existing assemblers. As a pioneering ancient metagenome assembler, CarpeDeam opens the door for new opportunities in functional and taxonomic analyses of ancient microbial communities.

12
Solution-phase indexing by kinetic confinement enables rapid, simple, and instrument-free single cell transcriptional profiling

Marafini, P.; Smith, D. G.; Lamstaes, A. R.; Contreras, R. E.; Williams, I.; West, I.; Ambridge, O.; Sanders-Brown, V.; Intaite, E.; Hii, C. Y.; Hume, B. C.; Munagala, U.; Plumbly, W.; Brown, F. L.; Shlyakhtina, Y.; Woods, L.; Bibby, J. A.; Williams, L.; Yang, J. H.; Steffy, B.; Zawada, L.; Harger, J. W.; McKenzie, D.; Laing, A. G.; Stubbington, M. J.; Edelman, L. B.

2024-11-02 genomics 10.1101/2024.11.01.621570 medRxiv
Top 0.1%
40.3%
Show abstract

Existing tools for single cell genomics require complex physical frameworks for the indexing of cellular nucleic acids, including proprietary instrumentation, droplet emulsions, and laborious combinatorial indexing schemes. The complexity and cost of these tools significantly constrains the use of single cell technologies across basic and translational research. Here, we describe an instrument-free method that uses novel, bifunctional indexing reagents to deliver index sequences directly to single cells followed by a biophysical process known as Kinetic Confinement to perform high-fidelity indexing of target molecules across thousands of single cells simultaneously in single-tube, solution-phase reactions. Kinetic Confinement enables simple, fast, and flexible single cell experiments, and allows straightforward scaling to very large sample numbers. We anticipate that assays based on Kinetic Confinement will significantly expand the scope, use, and impact of single cell analysis across fundamental and applied research, as well as within therapeutic development and ultimately applied clinical diagnostics.

13
ESPeR-seq: Extremely Sensitive and Pure, End-to-end, RNA-seq library preparation

Chen, H.-M.; Kao, J.-C.; Yang, C.-P.; Tan, C.; Lee, T.; Sugino, K.

2026-03-15 genomics 10.64898/2026.03.12.711386 medRxiv
Top 0.1%
40.2%
Show abstract

The Smart-seq family of methods represents the gold standard for high-sensitivity, full-length single-cell RNA sequencing. Despite iterative improvements, fundamental challenges remain: the generation of non-specific PCR products that limit sensitivity, the inability to capture precise Transcription End Sites (TES), and the insidious generation of "phantom UMIs"--artificial molecular barcodes created during PCR that systematically inflate molecular counts. Here, we present ESPeR-seq, a novel architecture that resolves these barriers. To enable precise, stranded TES capture, we developed an "Omega-dT" primer that bypasses synthetic poly-T tracts, restoring high-quality sequencing directly at transcript termini. To eliminate both PCR background and phantom UMIs, we implemented a biochemical "multi-lock" mechanism utilizing uracil-containing TSOs and a uracil-intolerant DNA polymerase. We validate this approach using the logQ-slope, a novel metric that sensitively diagnoses UMI fidelity. Benchmarking reveals that while state-of-the-art methods still exhibit signs of UMI inflation, ESPeR-seq strictly prevents it. Furthermore, the strandedness and precise end-delineation provided by TSO and dT reads support robust de novo gene model reconstruction, enabling the discovery of novel multi-exon genes, unannotated 3 UTR extensions, and candidate eRNAs across aggregated single-cell populations. Thus, ESPeR-seq establishes a robust framework for absolute quantitative accuracy and full-length isoform resolution.

14
CMS: Achieving Uniform and High-Quality Sequencing across Challenging Non-canonical Genomic Regions

Li, Q.; Liu, L.; Lin, Q.; Dan, X.; Jiang, Y.; Wei, Y.; Yang, M.; Peng, X.; Luo, W.; Wang, W.; Xu, D.; Huang, Z.; Sun, W.; Zhao, L.; Yan, Q.; Sun, L.; Feng, B.

2026-04-28 genomics 10.64898/2026.04.24.720553 medRxiv
Top 0.1%
39.9%
Show abstract

High-throughput sequencing is essential in modern biological research, yet low-complexity sequences remain challenging as they form structurally complex, non-canonical (non-B) DNA conformations that impede sequencing enzyme read-through. This leads to a long-standing trade-off: maximizing coverage introduces false positives (FP), while stringent filtering causes coverage loss and false negatives (FN). To address this, we developed CMS (Cross Mountains and Seas) on GeneMind sequencing platforms by optimizing its chemistry and enzymatic systems to traverse these secondary structures with high fidelity. Benchmarking across whole-genome (WGS) and whole-exome (WES) sequencing demonstrates that CMS addresses the trade-off by simultaneously enhancing both coverage uniformity and accuracy, notably achieving an approximately 100-fold reduction in low-coverage bins for WGS and a 70% reduction in FN insertions/deletions (INDELs) within complex non-B regions. Specifically, a synthetic G-quadruplex (G4) motif sequencing experiment demonstrates that CMS maintains a 1:1 strand ratio, effectively handling G4-induced biases where benchmarked platforms exhibit extensive depletion. These findings establish CMS as a reliable technology for the precise characterization of structural-challenging but functional-essential genome regions.

15
Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly

Henderson, G.; Gudys, A.; Baharav, T.; Sundaramurthy, P.; Kokot, M.; Wang, P. L.; Deorowicz, S.; Carey, A.; Salzman, J.

2024-01-22 bioinformatics 10.1101/2024.01.18.576133 medRxiv
Top 0.1%
39.8%
Show abstract

Bacteria comprise > 12% of Earths biomass and profoundly impact human and planetary health.1 Many key biological functions of microbes, and functions differentiating strains, are conferred or modified by genome plasticity including mobilization of genetic elements, phage integration, and CRISPR arrays. Characterizing each of these processes is time-consuming and requires custom bioinformatic workflows ill-suited to enable discovery of new sources of genetic diversity or to uncover which elements are active. Further, strain typing of bacterial species and approaches to discriminate sub-populations remain time-consuming and resource intensive. Here, we show that SPLASH, our published approach for reference-free discovery and analysis directly from raw reads, and an improved statistical assembly algorithm, compactors, unify diverse tasks in microbial sequence analysis: discovering new mobile elements and CRISPR arrays missing from any reference, and generating rapid, metadata-free strain typing of diverse bacteria. SPLASH and compactors together constitute a new general discovery tool for biological discovery in the microbial world.

16
Compound Delivery of eVLPs Enhances Prime Editing for Targeted Genome Engineering and High-Throughput Screening

Langley, J.; Baudrier, L.; Curry, J.; Narta, K.; Todesco, H. M.; Potts, K.; Morrissy, S.; Mahoney, D. J.; Billon, P.

2025-08-11 genomics 10.1101/2025.08.11.669692 medRxiv
Top 0.1%
39.5%
Show abstract

Engineered virus-like particles (eVLPs) enable transgene-free ribonucleoprotein delivery for genome editing applications, yet optimized delivery strategies for high-throughput applications remain unexplored. Prime editing enables precise genomic modifications but suffers from limited efficiency that constrains its widespread adoption. Here, we present PRIME-VLP (Progressive Repeated Infections for Maximized Editing via Virus-Like Particles), a delivery strategy that enhances prime editing efficiency for both targeted genome engineering and high-throughput prime editing screening. PRIME-VLP leverages the temporal dynamics of eVLP-mediated editing through multiple sequential transductions with sub-saturating eVLP doses delivered at optimal intervals. This approach achieves 1.5 to 2.8-fold improvements in editing efficiency across diverse genomic targets and cell types. PRIME-VLP maintains high specificity without increasing off-target effects, compromising cellular viability or causing transcriptional perturbations. By decoupling pegRNA and editor delivery through pegRNA-free eVLPs, PRIME-VLP enables pooled prime editing screens, circumventing transgene silencing limitations of conventional lentiviral-based screens. Using a 6,000-pegRNA library targeting TP53, PRIME-VLP achieved 2.8-fold higher editing efficiency and improved reproducibility compared to conventional lentiviral delivery. An eVLP-based screen identified functional TP53 loss-of-function variants that confer resistance to MDM2 inhibition by Nutlin-3. This work expands the versatility of eVLPs beyond their current in vivo therapeutic applications, demonstrating their promise for high-throughput functional genomics by overcoming the delivery limitations of lentiviral systems.

17
CROPseq-multi: a versatile solution for multiplexed perturbation and decoding in pooled CRISPR screens

Walton, R. T.; Qin, Y.; Blainey, P. C.

2024-03-17 genomics 10.1101/2024.03.17.585235 medRxiv
Top 0.1%
39.1%
Show abstract

Forward genetic screens seek to dissect complex biological systems by systematically perturbing genetic elements and observing the resulting phenotypes. While standard screening methodologies introduce individual perturbations, multiplexing perturbations improves the performance of single-target screens and enables combinatorial screens for the study of genetic interactions. Current tools for multiplexing perturbations are limited by technical challenges and do not offer compatibility across diverse screening methodologies, including enrichment, single-cell sequencing, and optical pooled screens. Here, we report the development of CROPseq-multi (CSM), a CROPseq1-inspired lentiviral system to multiplex Streptococcus pyogenes (Sp) Cas9-based perturbations with versatile readout compatibility and high performance for both perturbation and barcode identification. CSM has equivalent per-guide activity to CROPseq and low lentiviral recombination frequencies. Dual-guide CSM libraries are constructed in a single, facile molecular cloning step that facilitates the use of unique molecular identifiers. CSM is compatible with enrichment screening methodologies, single-cell RNA-sequencing readouts, and optical pooled screens. For optical pooled screens, an optimized and multiplexed in situ detection protocol improves barcode counts 10-fold (for mRNA detection), enables detection of recombination events, and reduces the number of sequencing cycles required for decoding by 3-fold relative to CROPseq. CROPseq-multi-v2 (CSMv2) adds compatibility for detection methods based on T7 RNA polymerase in vitro transcription2-5. CSM provides a single system for CRISPR screens that is compatible with individual and combinatorial perturbations, diverse SpCas9-based perturbation technologies, and multiple high-content, single-cell phenotypic readouts.

18
Denoising sparse microbial signals from single-cell sequencing of mammalian host tissues

Ghaddar, B.; Blaser, M. J.; De, S.

2022-06-30 genomics 10.1101/2022.06.29.498176 medRxiv
Top 0.1%
38.9%
Show abstract

We developed SAHMI, a computational resource to identify truly present microbial nucleic acids and filter contaminants and spurious false-positive taxonomic assignments from standard transcriptomic sequencing of mammalian tissues. In benchmark studies, SAHMI correctly identifies known microbial infections present in diverse tissues. The application of SAHMI to single-cell and spatial genomic data enables co-detection of somatic cells and microorganisms and joint analysis of host-microbiome ecosystems.

19
MOSHPIT: accessible, reproducible metagenome data science on the QIIME 2 framework

Ziemski, M.; Gehret, L.; Simard, A.; Castro Dau, S.; Risch, V.; Grabocka, D.; Matzoros, C.; Wood, C.; Momo Cabrera, P.; Hernandez-Velazquez, R.; Herman, C.; Evans, K.; Robeson, M. S.; Bolyen, E.; Caporaso, J. G.; Bokulich, N. A.

2025-02-21 bioinformatics 10.1101/2025.01.27.635007 medRxiv
Top 0.1%
37.7%
Show abstract

Metagenome sequencing has revolutionized functional microbiome analysis across diverse ecosystems, but is fraught with technical hurdles. We introduce MOSHPIT (https://moshpit.readthedocs.io), software built on the QIIME 2 framework (Q2F) that integrates best-in-class CAMI2-validated metagenome tools with robust provenance tracking and multiple user interfaces, enabling streamlined, reproducible metagenome analysis for all expertise levels. By building on Q2F, MOSHPIT enhances scalability, interoperability, and reproducibility in complex workflows, democratizing and accelerating discovery at the frontiers of metagenomics.

20
High-coverage, massively parallel sequencing of single-cell genomes with CAP-seq

Li, M.; Zhai, X.; Li, J.; Li, S.; Du, Y.; Zhang, J.; Zhang, R.; Luo, Y.; Wei, W.; Liu, Y.

2024-09-11 genomics 10.1101/2024.09.10.612220 medRxiv
Top 0.1%
37.5%
Show abstract

Microbial communities are extraordinarily diverse and play crucial roles in health and disease, yet current methods lack the resolution and scalability needed to dissect their genomic and ecological complexity at the single-cell level. Here, we present CAP-seq, a high-throughput single-microbe genomics platform that combines hydrogel-based semi-permeable encapsulation with minimal microfluidics to recover thousands of single-amplified genomes (SAGs) with long reads and high completeness at low sequencing depth. We benchmarked CAP-seq using defined microbial communities, demonstrating strain-level resolution, accurate detection of rare taxa, and genome recovery exceeding 50% at [~]10x coverage. Applying CAP-seq to pediatric Clostridioides difficile infection microbiomes, we generated a high-resolution single-cell atlas comprising tens of thousands of SAGs across hundreds of species. Host-resolved profiling of the cryptic plasmid pBI143 revealed previously hidden low-abundance host associations, six new plasmid versions, and their coexistence within individuals, indicating complex plasmid evolution in situ. Longitudinal analysis during fecal microbiota transplantation and vancomycin treatment uncovered dynamic remodeling of microbial hosts, antimicrobial resistance genes, and plasmids at single-cell resolution. CAP-seq enables scalable, high-performance single-cell genomics and provides a practical, widely accessible platform for microbiome analysis, paving the way for large-scale exploration of microbial dark matter and host-microbe interactions across diverse ecosystems.